Online-Academy
Look, Read, Understand, Apply

Pandas - Introduction

Pandas

Introduction:

In Python, pandas is one of the most popular libraries used for data analysis and data manipulation. It helps you easily work with structured data like tables, spreadsheets, or databases. Because of its powerful tools for handling data, pandas is widely used in data science, machine learning, statistics, and research. A Python library used for analyzing and organizing data. Built on top of NumPy, which provides fast numerical operations.

Main purpose:
  • To read, clean, analyze, and manipulate data efficiently.
  • Main data structures in pandas:
  • Series -> A one-dimensional labeled array (similar to a column in Excel).
  • DataFrame -> A two-dimensional table with rows and columns (like an Excel sheet).
Common things you can do with pandas:
  • Read data from CSV, Excel, SQL databases, JSON files
  • Clean messy data
  • Filter and sort information
  • Perform statistical analysis
  • Group and summarize data
import pandas as pd
data = {
    "Name": ["Ram", "Sita", "Hari"],
    "Age": [20, 22, 21]
}
df = pd.DataFrame(data)
print(df)

Some basics functions of dataframe object.

import pandas as pd
df = pd.read_csv("data.csv")
#Shows the first 5 rows of the dataset (default).
df.head()
#Displays the last 5 rows of the dataset.
df.tail()
#Gives a summary of the dataset, including column names, data types, and missing values.
df.info()
#Provides statistical summary (mean, min, max, etc.) for numerical columns.
df.describe()
#Shows the number of rows and columns in the dataset.
df.shape
#Displays the column names of the dataset.
df.columns
#Sorts the data based on a column.
df.sort_values("Age")
#drop(): Removes rows or columns from the dataset.
df.drop("Age", axis=1)
#groupby(): Groups data by a column and performs calculations.
df.groupby("Department").mean()